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ABSTRACT 

We discuss the construction of a photometric redshift catalogue of Luminous Red 
Galaxies (LRGs) from the Sloan Digital Sky Survey (SDSS), emphasizing the princi- 
pal steps necessary for constructing such a catalogue - (i) photometrically selecting 
the sample, (ii) measuring photometric redshifts and their error distributions, (iii) and 
estimating the true redshift distribution. We compare two photometric redshift algo- 
rithms for these data and find that they give comparable results. Calibrating against 
the SDSS and SDSS-2dF spectroscopic surveys, we find that the photometric redshift 
accuracy is ct ~ 0.03 for redshifts less than 0.55 and worsens at higher redshift (~ 0.06 
for z < 0.7). These errors are caused by photometric scatter, as well as systematic er- 
rors in the templates, filter curves, and photometric zeropoints. We also parametrize 
the photometric redshift error distribution with a sum of Gaussians, and use this 
model to deconvolve the errors from the measured photometric redshift distribution 
to estimate the true redshift distribution. We pay special attention to the stability of 
this deconvolution, regularizing the method with a prior on the smoothness of the true 
redshift distribution. The methods we develop are applicable to general photometric 
redshift surveys. 



1 INTRODUCTION 



Since their inception, photometric redshifts fKod 1198 



1997l:lHogg et alJll998l:lBem'teJ20q0l:lBolzonella et al.ll2000l: 



IConnollv et al.lll995l:lGwvn fc Hartwickll996: Sawicki et alJ 



Csabai et al."2000'; 'Budavari et al.'"200l|; ICollister fc Lahavl 

2004*) have provided a possible solution to the major limita- 
tion of large redshift surveys - that they are severely limited 
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both in depth and area by the throughput of spectrographs. 
Photometric redshift algorithms essentially define a map- 
ping from the observed photometric properties of galaxies 
to their redshifts and other physical properties such as lu- 
minosity and type. Given an accurate photometric redshift 
algorithm, one can map the observable Universe in three 
dimensions just by imaging in carefully chosen passbands. 
The relative efficiency of imaging compared to spectroscopy 
allows one to both go deeper and cover a larger area than 
is possible with traditional redshift surveys. Such surveys 
would be invaluable both for studies of large scale struc- 
ture, as well as understanding the evolution of galaxies. In 
addition, imaging surveys with well understood redshift dis- 
tributions are essential for efforts to directly map the matter 
distribution using weak lensing. 

Defining a photometric redshift catalogue involves ful- 
filling two requirements: one must photometrically specify 
a population of galaxies for which reliable photometric red- 
shifts can be obtained, and one must characterize the photo- 
metric redshift error distribution. Demonstrating this pro- 
cess is the purpose of this work, using th e five colour imag ing 
of the Sloan Digital Sky Survey (SDSS. lYork et al.ll200(ili as 
an example. 

Luminous Red Galaxies (LRGs) have long been rec- 
ognized as a promising population for the application 
of photometric redshifts llHamiltonlll985l: iGladders fc Yed 
l2000l:lEisenstein et al.l200ll:IWiUis et all2001^ . These galax- 
ies hav e remarkably uniform spectral energy distrib utions 
(SEDs, ISchneider etai] Il983l : lEisenstein et alJ 1200311 that 
are characterized by a strong break at 4000 A caused by the 
accumulation of a number of metal lines. The redshifting of 
this feature through different filters gives these galaxies their 
characteristic red colours that are strongly correlated with 
redshift. This makes it easy to select these galaxies and to 
estimate photometric redshifts. In addition, these are among 
the most luminous galaxies in the Universe, and map large 
cosmological volumes. Furthermore, LRGs are strongly cor- 
related with clusters, making them an ideal tool for detecting 
and studying clusters. All of the above make LRGs an as- 
trophysically interesting sample and an ideal candidate for 
a photometric redshift survey. 

Measuring the photometric redshift error distribution 
requires a calibration set of spectroscopic redshifts that span 
a similar colour and magnitude range as the photomet- 
ric catalogue. We use two redshift catalogues to calibrate 
the LRG photom etric redshifts, the SDSS Data Release 1 
iAbazaiian et a" ]|2003) 1 LRG spectroscopic catalogue for 
redshift s < 0.4, and th e SDSS-2dF LRG spectroscopic cat- 
alogue iCannon et alj r2003) for redshifts between 0.4 and 
0.7. These catalogues have extremely good coverage of the 
LRG colour and magnitude selection criteria by design; the 
selection criteria we use have been strongly infiuenced by 
both these catalogues. In addition, we supplement the low 
redshift catalogue with the SDSS MAIN galaxy catalogue 
complete to an r band magnitude of 17.77. 

A generic problem in interpreting analyses with pho- 



^ We note that Data Release 1 here only refers to the area cover- 
age; the reduction pipelines used are identical with those for DR2 
and DR3. In particular, the model magnitude bug in the DRl 
reductions does not affect this paper. 



tometric redshifts is estimating the conditional probability 
distribution, P{zapectro\zphoto), as this allows us to connect 
the measurement - the photometric redshift - with the phys- 
ical quantity - the actual redshift of the galaxy. This abil- 
ity to connect photometric redshifts with actual redshifts 
is essential to theoretically interpret results derived from 
photometric surveys, and generically will be a significant 
source of systematic error. The simplest way to measure 
P{zspcctro\zphoto) is to directly measure it from a calibra- 
tion data set. Unfortunately, P{zspectro\zphoto) depends on 
the underlying redshift distribution, and therefore to obtain 
unbiased results, the calibration data and the actual data 
must sample the same redshift distribution. This is quite 
often not the case, since calibration data are drawn from 
heterogenous sources. We also note that simulations cannot 
solve this problem, since the P{zspectro\zphoto) derived will 
depend on the simulated redshift distribution, which might 
differ significantly from the true distribution. 

The approach that we favour in this paper is 
to use Bayes' theorem to relate P{zspe.ctro\zphoto) to 
P{zphoto\z spectra), using the true redshift distribution, 
dN/dz of the photometric sample. For samples selected 
only with an apparent magnitude cut, one can estimate 
dN/dz directly from the galaxy luminosity function (for eg. 
iBudavari e^al]l2003^ . This approach is significantly harder 
for samples, like the ones considered in this paper, that 
involve multiple magnitude and colour cuts, as it involves 
the joint luminosity-colour distribution functions that are 
poorly understood. 

We present an alternative method to estimate dN/dz 
in this paper, that starts from the observation that the ob- 
served photometric redshift distribution is just the true red- 
shift distribution convolved with the photometric redshift er- 
rors. Phrased as such, estimating dN/dz is simply the prob- 
lem of deconvolving the redshift errors from the measured 
redshift distribution. This problem, like all deconvolution 
problems, is ill-conditioned and must be regularized to ob- 
tain a stable solution. 

This paper is organized as follows - Sec. 2 describes the 
two sources for our calibration data, the SDSS and SDSS- 
2dF surveys, and presents our selection criteria for LRGs. 
In Sec. 3, we describe two photometric redshift algorithms 
and calibrate them against the catalogues from the previ- 
ous section and measure the photometric redshift error dis- 
tribution. Sec. 4 discusses using this error distribution to 
invert the observed photometric redshift distribution to re- 
construct the true redshift distribution, while Sec. 5 summa- 
rizes our conclusions. Whenever necessary, we have assumed 
a cosmology with il™, = 0.3, Q,a = 0.7 and Hq = 100ft 
km/s/Mpc. 



2 SELECTING RED GALAXIES 

We start by describing the data that form our calibra- 
tion dataset, the Sloan Digital Sky Survey's spectroscopic 
(MAIN and Luminous Red Galaxy) survey and the SDSS- 
2dF LRG survey; the reader is referred to t h e appropriate 
technical documents (lEisenstein et al] l200lt IStrauss et alJ 
l2002h for a more detailed description. We then present the 
exact cuts used to construct our sample of LRGs. These are 
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similar in spirit to those in lEisenstein et all (1200 J) although 
they differ in detail. 

Since the photometry for the two catalogues we use is 
from the SDSS, we restrict our discu ssion in this paper to 
the SPSS 5- filter photometric system iFukugita et alJll996l: 
ISmith et al.l 12002). The methods can be generalized to an 
arbitrary photometric system. Except where explici tly spec- 
ified, we will use SDSS model magnitudes ( Stoueht on et alJ 
l2002t) : for instance, g will refer to an SDSS g band model 
magnitude. SDSS Petrosian magnitudes will be denoted by 
a subscripted "Petro", e.g. rpetro is the SDSS r band Pet- 
rosian magnitude. 

Finally, a comment on the magnitude sy stem used : it 
has b ecome traditional to use AB magnitudes llOke fc GunnI 
Il983^ for estimating photometric redshifts. The SDSS mag- 
nitudes are cl ose to AB magnitude s, but differ at the 
millimag level jAbazaiian et allbOO'd) . The final zeropoint 
corrections for the SDSS have yet to be determined; we 
use the best estimate of these offsets available at the 
time of writing. The offsets applied are A(u, g,r, 1,2) = 
(-0.042, 0.036, 0.015, 0.013, -0.002). We note that the pho- 
tometric redshifts are not very sensitive to the precise values 
of these offsets; not including them changes the measured 
redshifts by Az ~ 0.005, completely subdominant to the 
photometric redshift errors. 

2.1 The SDSS Surveys 

The Sloan Digital Sky Survey (SDSS) is an ongoing sur- 
vey to image approximately vr steradians of the sky, and 
follow up approximately one million of the detected ob- 
jects spectroscopi cally. The imaging is carried out by drift- 
scanni ng the sky ( Gunn et alJll99ah in photometric condi- 
tions teogg et aljboOlD in 5 (ugriz) bands using a spe- 
cially designed wide-field camera. Using these imaging data 
as a source, objects targe ted for spectroscopy ^Blanton e t al] 
l2003l:IStrau~ et al .l2002h are observed with a 640 fiber spec- 
trograph on the same telescope. All of these data are pro- 
cessed by completely automated pipelines that detect and 
measure photometric properties of objects, and astrometri- 
cally calibrate the data (Lupton 2004; Pier et a l- 2Qoj). The 
SDSS is close to completion, and has h ad three major data 
re leases (EDR, Stoughton et aLliooS; DRl, Abazajian et 
al l2003l : DR2, Abazajian et al, 12004). This paper wiU limit 
itself to DRl, with approximately 168,000 spectra. 

The data used in this paper are from the MAIN 
^^K^^^J_|200^) and Luminous Red Galaxy (LRG) 
Eisengteh^tan]200lh surveys. The MAIN galaxy sample 
is a magnitude limited survey targeting all galaxies with 
rpetro < 17.77. The SDSS LRG sample targets a smaller 
set of galaxies with rpetro < 19.5; these galaxies are colour 
selected to have strong 4000 A breaks allowing a spectro- 
scopic determination of their redshifts even though they are 
~ 2 magnitudes fainter than the MAIN galaxy sample. The 
selection methodology of these galaxies forms the basis both 
of the SDSS-2dF survey which we now discuss, and the se- 
lection criteria we present in Sec l2.3l 

2.2 The SDSS-2dF Survey 

The second set of observations are the first data obtained 
as part of the SDSS-2dF LRG survey. This redshift survey. 



started in early 2003, exploits the marriage of two facili- 
ties; the wide-angle, multi-colour, imaging data of the SDSS 
and the 2dF spec trograph on the 4- meter Anglo-Australian 
Telescope fAAT. iLewis et al.ll20o3 ). The SDSS-2dF LRG 
survey is being carried out in tandem with the SDSS-2dF 
QSO survey to ensure optimal use of the 400 spectroscopic 
fibers available in the 2dF spectrograph. 

The goal of the SDSS-2dF LRG survey is to replicate the 
selection of SDSS LRGs but at a higher redshift, by going to 
fainter apparent luminosities. In particular, we aim to closely 
match the space density, luminosity range and colours of the 
lower redshift SDSS LRGs, thus allowing study of the evo- 
lution of a single population of massive galaxies over a large 
redshift range. To a chieve this goal, we use t he same method- 
ology as outlined in lEisenstein et al] i200lfl for selecting the 
low redshift SDSS LRGs, but adapt the colour and magni- 
tude cuts to preferentially select LRGs in the redshift range 
0.45 < z < 0.7. The SDSS-2dF LRG cuts we use are similar 
to those of the Cut I I SDS S LRG sample discussed in detail 
in lEisenstein et all i200ll) . However, because of the larger 
telescope (AAT), and the longer integration times possible, 
we relax the r < 19.5 magnitude limit of Cut II (which re- 
sulted in a severe redshift limit of z ~ 0.45 for th e SDS S 
LRGs) to j < 20. As discussed in lEisenstein et all i200ll) . 
the selection of LRGs above z ~ 0.4 is actually easier than 
selecting them at lower redshifts because the 4000 A break 
moves into the SDSS r band and therefore, the SDSS r — i 
colour is an effective estimate of the redshift, while the g — r 
colour is a proxy for the rest-frame colour of the galaxy. 

The details of the SDSS-2dF LRG selection criteria wiU 
be presented elsewhere. However, as shown in|Nichol (200 J 
and Fig. |3] the SDSS-2dF selection criteria successfully re- 
produce the luminosity range cover ed by the lower redshift 
SDSS LRGs (lEisenstein et alJl200ll both Cut I and Cut II) 
over the expected range of redshifts from 0.4 < z < 0.75. 
Note that the r — i colour selection is very effective at isolat- 
ing high redshift galaxies, with 90% of the galaxies having 
redshifts between 0.4 and 0.7, and virtually none with red- 
shifts < 0.3. The SDSS-2dF LRG and QSO surveys are 
underway with the goal of obtaining the final sample of 
~ 10, 000 LRGs and quasars. The data we use are all the 
data observed through 2003 with reliable spectroscopic red- 
shifts, a sample of ~ 3000 galaxies. 

2.3 Selection Criteria 

We now discuss the construction of a photometric sample 
of LRGs. Although the selection criteria we present here 
(including the terminology) are based on the spectroscopic 
selection used to construct the two samples discussed above, 
we emphasize that these are not the specific selection crite- 
ria for either sample, but rather are a synthesis of different 
selection techniques. The goal of these selection criteria is to 
photometrically select a uniform sample of LRGs over the 
redshift range 0.2 < z < 0.7. 

Fig. shows a model spectrum of an early type 
galaxy from the stellar population synthesis models of 
iBruzual fc Charlod f2003'). This particular spectrum is de- 
rived from a single burst of star formation 11 Gyr ago (im- 
plying a redshift of formation, Zform ^ 2.6), evolved to the 
present, and is typical of LRG spectra. In particular, the 
4000 A break is very prominent. In order to motivate our 



© 0000 RAS, MNRAS 000, 000-000 



4 Padmanabhan et al 



1 iii'^vi^^ 




4000 6000 8000 10000 
A 




Figure 1. A model spectrum of an early type galaxy from 
Bruzual & Chariot (2003). The model was formed from a single 
burst of star formation 11 Gyr ago, and assumes a solar metal- 
licity. Note the prominent break in the spectrum at 4000 A. Also 
overplotted are the response functions (including atmospheric ab- 
sorption) for the SDSS filters. 



selection criteria, we passively evolve this spectrum in red- 
shift (in particular, taking the evolution of the strength of 
the 4000 Abreak into account), and project it through the 
SDSS filters; the resulting colour track m g — r — i space 
as a function of redshift is shown in Fig. |5| The bend in 
the track around z ~ 0.4, caused by the redshifting of the 
4000 A break from the g to r band, naturally suggests two 
selection criteria - a low redshift sample (Cut I), nominally 
from 2 ~ 0.2 — 0.4, and a high redshift sam ple (Cut II). from 
z ^ 0.4 — 0.6. We define the two colours jEisenstein et all 
I2OOII . and private commun.) 



cx = (r-i) - ((?-r)/4-0.18 , 
= (r — i) — (g — r)/8 ~ r — i . 

We now make the following colour selections. 

Cut I : I cx |< 0.2 ; 
Cut II : dx > 0.55 , 

g-r>lA , 



(1) 
(2) 

(3) 
(4) 
(5) 



as shown in Fig. |5| The final cut, g ~ r > 1.4, isolates our 
sample from the stellar locus. In addition to these selection 
criteria, we eliminate all galaxies with g — r > 3 and r — 
i > 1.5; these constraints eliminate no real galaxies, but are 
effective in removing stars with unusual colo urs. 

U nfortunately, as emphasized in Ei senstein et alJ 
i200lfl . these simple colour cuts are not sufficient to select 
LRGs due to an accidental degeneracy in the SDSS filters 
that causes all galaxies, irrespective of type, to lie very close 
to the low redshift early type locus. We therefore follow the 
discussion there and impose a cut in absolute magnitude. 
We implement this by defining a colour to use as a proxy 
for redshift and then translating the absolute magnitude cut 
into a colour- magnitude cut. We see from Fig.|5|that dx cor- 



Figure 2. The top panel shows simulated g — r and r — i colours 
of an early-type galaxy as a function of redshift. The spectrum 
used to generate the track is the same as in Fig.0 but evolved in 
redshift. Also shown are the colour cuts for Cut I (dashed, black) 
and Cut II galaxies (solid, blue). The points show the stellar locus 
as determined by a sample of stars with r-band magnitudes less 
than 19.5. The lower panel shows the colours C| | (diamonds, black) 
and dx (triangles, red), as a function of redshift. Also shown are 
fiducial redshift boundaries for Cut I (0.2 - 0.4) and Cut II (0.4 
— 0.6). Note that the range in g — r is identical to the range in 
1 + z. 



relates strongly with redshift and is appropriate to use for 
Cut II. For Cut I, we define. 



:|| = 0.7{g -r) + 1.2(r - i - 0.18) , 



(6) 



which is approximately parallel to the low redshift locus. 
Given these, we further impose 



Cut I: rpetro < 13.6-f C||/0.3 
rpetro < 19.7 ; 

Cut II : i< 18.3 + 2dx , 
i < 20 . 



(7) 



(8) 



Note we use rpetro for consistency with the SDSS LRG tar- 
get selection. We note that Cut I is identical (except in the 
numerical values of the magnitude cuts in Eqs. |7|l to the 
SDSS LRG Cut I, while the numerical values for Cut II 
were chosen to yield a population consistent with Cut I 
(see below). This was intentionally done to maximize the 
overlap between any sample selected using these cuts, and 
the SDSS LRG spectroscopic sample. The switch to the i 
band for Cut II also requires some explanation. As is clear 
from Fig[T] the 4000 A break is moving through the r band 
throughout the fiducial redshift range of Cut II. This im- 
plies that the K-corrections to the r band are very sensitive 
to redshift; the i band K-corrections are much less sensitive 
to redshift allowing for a more robust selection. 

The results of applying these cuts to the spectroscopic 
catalogs are shown in Fig. |3 Since the SDSS spectroscopic 
catalogue is at low redshift, we trim the catalogue using 
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Figure 3. The top panel shows the spectroscopic redshift dis- 
tribution, dN/dz of the SDSS (solid, black) and the SDSS-2dF 
(dashed, red) samples trimmed using the selection criteria of 
Sec l2.3l Note that the SDSS sample is dominated by the low red- 
shift MAIN sample, accounting for the low normalization at high 
redshift. The lower panel shows the i band absolute magnitude 
distribution for the two samples, demonstrating that our absolute 
magnitude cuts are selecting a sample with Mi ~ —22 as desired. 
Both dN/dz and dN/dM are normalized so that they integrate 
to unity. 



Cut I, while the higher redshift SDSS-2dF data are trimmed 
with Cut II. Our calibration dataset has 45,744 Cut I galax- 
ies, and 1,474 Cut II galaxies. The large number of low 
redshift galaxies that pass Cut I indicate a failure of our 
selection criteri a at redshifts lower th an z ~ 0.15, as al- 
ready noted by lEisenstein et al.l i200 J) . We however leave 
these galaxies in our analysis, since they will contaminate 
any photometrically selected sample and it will be neces- 
sary to understand their photometric redshift distributions. 
No such problem exists for the Cut II galaxies, which have 
a negligible fraction of 2; < 0.4 galaxies. The most signifi- 
cant contaminant for Cut II are M stars. The g ~ r > 1.4 
cut removes most of these, although there is a small residual 
level of contamination. Analyses using this or similar sam- 
ples will have to estimate the effect of this contamination on 
their results. 

The lower panel of Fig. |3 shows the absolute magni- 
tude distribution of both Cut I and Cut II galaxies. As ex- 
pected, the colour magnitude cuts restrict the sample to 
bright galaxies; the median i band magnitude is Mi ~ —22. 
Note that the Cut I and Cut II galaxies probe similar lu- 
minosities. The Cut I magnitude distribution also has a tail 
extending to low luminosities; this is the failure of the selec- 
tion criteria at low redshift we encountered above. 



3 PHOTOMETRIC REDSHIFT ESTIMATION 

Photometric redshift estimation techniques can be clas- 
sified into two groups, "empirical" and "template- 



fitting" methods. Empirical methods JConnollv et alJll995l : 
iBrunner et al.l[l999l IWang et al.lll998() are based on the ob- 
servational fact that galaxies are restricted to a low dimen- 
sional surface in the space of their colours and redshift. 
Using a training set of galaxies, these methods attempt to 
parametrize this surface, either with low-order polynomials, 
nearest-neighbour searches or neural networks. The advan- 
tage of these methods is that they attempt to measure these 
relationships directly from the data, and so, implicitly cor- 
rect for any calibration biases present. A publi cly available 
example is the Artificial Neur al Network code iFirth et alJ 
120031: ICollister fc Lahall2004 ANNz,) that trains a neural 
network to learn the relation between photometry and red- 
shift from an appropriate training set of observed galaxies 
whose redshifts are already known. This code has a pho- 
tomet ric redshift accuracy sim ilar to the methods decribed 
below dCoUister fc Lahall2004l) . 

The fact that these methods rely on training sets is 
their greatest disadvantage. For these methods to work, the 
training set must densely sample the entire redshift-colour 
space of interest, as it is difficult to extrapolate outside the 
domain of the training set. Most training sets, including the 
samples constructed above, violate the above requirement 
and therefore are of limited utility. Template-based methods 
do not suffer from these drawbacks, and form the basis for 
the two algorithms used in this paper, which we now discuss. 



3.1 Simple Template Fitting 

Template fitting methods start with a set of model spectra 
(the "templates") of galaxies, either from sp ectrophotomet- 
ricall y calibrated observations of galaxies_]|Colemai^^l 
I198Q|) or from stellar synthesis models (Br uzual fc Chariot 



i2003i : iLe Borgne fc Rocca-V olinerange 200^ These meth- 
ods then attempt to reconstruct the observed colours of 
galaxies by some (appropriately redshifted) linear combi- 
nation of the templates, projected through the appropri- 
ate filters. The best fit redshift is then an estimate of the 
galaxy's true redshift. Concretely, if we denote the templates 
by ^'^(2:), this algorithm can be cast as a minimization of 



E 



(9) 



where is the observed fiux (with error ctq ) of the galaxy 
in the a filter, and Ra{'^) projects the spectrum onto the 
Q filter. For definiteness, w e work with the AB photometric 
system llOke fc Gunnlll983l) . where the apparent magnitude 
of a galaxy, itiab, is related to its SED, ^ (with units of W 
m-2 Hz-i), by 



rriAB = -2.51ogi 



J duv-^g{u)W^{v) 



(10) 



where Wa is the response of the a filter. The reference SED 



10" 



W m" 



is given by g{h') = 3631Jy (where 1 Jy 
Hz-i). 

One of the advantages of the LRGs is that their spec- 
tra are well described by a single template (lEisenstein et all 
i200>^ . We find that the LRG colours are well described by a 
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iBruzual fc Charlol (1200311 ^ single instantaneous burst tem- 
plate at solar metallicity with the burst occurring when the 
Universe was 2.5 Gyr old. The template evolves with time, 
becoming redder and increasing the 4000 A break, as the 
more massive hot stars die. To incorporate this evolution, 
we interpolate between models with bursts with ages of [11, 
5, 2.5, 1.4, 0.9, 0.64, 0.1] Gyr to calculate the template as a 
function of redshift. The photometric redshifts we derive are 
insensitive to the precise time of the burst, and therefore, 
we do not attempt any optimization of this parameter. 

The implementation of this method we use is part 
of th e IDLSPEC2D SDSS spectroscopic reduction pipeline 
jBurlcs & Sc hlegeL .20041 and can be downloaded through 
the WWW^ 



3.2 Hybrid Methods 

Obviously, this simple template-fitting algorithm is effective 
only when the templates accurately describe the photomet- 
ric properties of the galaxies for which one wants to estimate 
redshifts. This suggests generalizing the template-fitting al- 
gorithm by incorporating features of empiric al methods 
JCsabai et ai]|200(]l : l&udavari et aDl200d l200lll . The basic 
approach is to divide a training set into spectral classes cor- 
responding to each of the templates. Given these training 
sets for the individual templates, one can repair the tem- 
plates by adjusting them to better reproduce the measured 
colours of the galaxies in the training set. By repeating 
this classification and repair procedure, one can obtain a 
improved te mplate set that yiel ds more reliable photomet- 
ric redshifts ICsabai et al.ll20 0y). Moreover, this process of 
adjusting the templates to agree with observations makes 
hybrid methods potentially less sensitive to potential sys- 
tematic problems due to errors in the filter curves or pho- 
tometric zeropoints. We refer the reader to the above pa- 
pers for details on the implementation of this algorithm. 
For the LRGs, we s tart with an elliptical template from 
IColeman et al.l (ll98Cf) and apply the above algorithm to op- 
timize it. This is done in an iterative manner and converges 
after typically three iterations. 

This single optimized template is used for an initial red- 
shift estimate for all LRGs. The SDSS LRG sample is, how- 
ever, selected assuming a pa ssively evolving elliptical tem- 
plate jEisenstein et alJl200ir) . Therefore, we expect to gain 
in photometric redshift accuracy if we allow the LRG spec- 
tral template to evolve with redshift. The SDSS and SDSS- 
2dF redshift samples are subdivided into three redshift in- 
tervals 0.00 <z < 0.35, 0.25 < z < 0.45 and 0.35 < z < 1.0, 
based on the photometric redshifts of the individual galax- 
ies. Within each interval we optimize the spectral template 
as described above. The overlapping redshift intervals pro- 
vide a smooth progression in spectral type from one redshift 
interval to the next, as well as ensuring that the number and 
distribution of calibration redshifts is sufficient to constrain 
the colours of galaxies across a broad spectral range. 



These template stellar population spectra are 
part of the GALAXEV package available at 
http : //www. cida. ve/'~bruzual/bc2003. 
^ http://spectro.princeton.edu 




Spectroscopic 



Figure 4. Scatter plot showing the photometric redshift versus 
the spectroscopic redshift for a random 10000 galaxies from our 
calibration sample. The upper panel shows the results for the 
simple template fitting code of Sec 13. II and the lower panel are 
the results for the hybrid code of Sec 13. 21 The solid (red) line has 
slope 1, while the dashed line marks the fiducial lower redshift 
limit of any photometric LRG sample. The difficulty of estimating 
redshifts at z ^ 0.4 is evident from the increased scatter. 

3.3 Results 

We now apply the methods of the previous two sections to 
our calibration dataset; the results are summarized in Fig^] 
Both are essentially unbiased (lAzmeanl < 0.01) at redshifts 
less that 0.5. At higher redshifts, the photometric redshifts 
are systematically lower than the spectroscopic redshifts by 
about 5%. The scatter in both methods is approximately 
a ~ 0.035, except at redshifts greater than 0.55, where the 
scatter grows to ~ 0.06, caused both by increased photomet- 
ric scatter and increased uncertainties in templates (caused 
by for eg. star formation or emission lines). 

There are two noticeable features in Fig^Jthat deserve 
comment. The first is that the hybrid methods do signifi- 
cantly better than the single template fits at low redshifts 
{z < 0.15). This is due to the failure of the LRG selection 
criteria at low redshifts; a single elliptical template no longer 
well describes this population. This highlights an important 
advantage of the hybrid methods - they adjust their tem- 
plates to better describe the populations. 

The second feature is the increased scatter around z ~ 
0.4, caused by an accidental degeneracy due to the SDSS 
filters. Fig.0shows a gap between the g and r bands at about 
5500 A". As the 4000 A break enters this gap at z ~ 0.38, the 
lack of coverage in either the g ox r band causes a degeneracy 
between the strength of the 4000 A break and its location, 
increasing the redshift errors. 

ft is useful to be able to separate the effects of template 

* This gap is partly intentional, avoiding the OI (5577 A) night- 
sky emission linos. However, the filters were intended to overlap 
more. 



© 0000 RAS, MNRAS 000, 000-000 



Calibrating Photometric Redshifts of Luminous Red Galaxies 7 



0.07£ 




-0.3 -0.2 -0.1 -0.0 0.1 0.2 0.3 




0.1 75 



0.3 -0.2 -0.1 -0.0 0.1 0.2 0.; 



-0.3 -0.2 -0.1 -0.0 0.1 0.2 0.3 



0.22S 



1 



-0.3 -0.2 -0.1 -0.0 0.1 0.2 0.3 




0.325 




3 -0.2 -0.1 -0.0 0.1 0.2 0.3 



>.3 -0.2 -0.1 -0.0 0.1 0.2 0.3 



0.375 




-0.3 -0.? -0.1 -0.0 0.1 0.2 0.3 




0.475 









0.3 -0.2 -0.1 -0.0 0.1 0.2 0.; 



-0.3 -0.2 -0.1 -0.0 0.1 0.2 0.3 



Figure 6. The double Gaussian fits to the error distribution as a function of spectroscopic redshift. The x axis shows the redshift error, 
Zspectro — Zpf^^toy S'lid each panel is a redshift slice with the central redshift shown in the upper left. The histogram is the measured 
distribution, while the curves are the best fit Gaussian (both individually and summed). The data here are SDSS galaxies selected using 
Cut I. The photometric redshifts are estimated using the method of Sec. 13.21 



errors from photometric errors in the redshift error budget. 
In order to do this, we simulate galaxies by uniformly dis- 
tributing them between < z < 0.8 with synthetic colours 
given by the template used in the method of Sec l3.1l We 
then add errors to these synthetic fluxes; focussing on two 
extreme cases - uniform errors across all 5 bands, and no 
S/N in the u and z band (i.e. infinite magnitude errors cor- 
responding to non-detection in u and z band, with uniform 
errors in the other bands). The latter case is motivated by 
the fact that the SDSS camera is least sensitive in the u 
and z bands, and because most LRGs are not detected in 
the u band. The results of this exercise are shown in Fig. 
|S] The upper panels show a realization with a (optimistic) 
magnitude error, a,n ~ 0.03. For comparison, the median 
S/N (~ l/(7„) for LRGs at 2 ~ 0.3 is ~ (2,30,70,80,30), 



and ~ (0.5, 10, 25, 36, 15) at z ~ 0.5. A prominent feature 
is the degeneracy at z ~ 0.4 discussed above, for the case 
where the u and z bands have no signal. In this case, there 
is no extra information that can be used to break the de- 
generacy between the 4000 A break strength and its loca- 
tion. Also, the scatter in the photometric redshifts increases 
for redshifts greater than ~ 0.35, coinciding with the 4000 
A break moving into the r band, and the loss of redshift 
information from the g — r colour. The lower panel shows 
how the redshift errors increase with increasing magnitude 
errors, again for the cases of uniform S/N in all bands, and 
in g,r,i with zero S/N in u and z. We also note that the 
redshift errors we measure are consistent with being caused 
principally by photometric scatter. However, the bias seen 
at high redshifts cannot be caused by photometric errors. 
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Figure 5. Simulations sliowing tlie effect of magnitude errors on 
the accuracy of the photometric redshifts. The upper left plot 
shows the reconstructed photometric redshifts for a magnitude 
error, a-m = 0.03 in all 5 bands, while the upper right panel has no 
S/N in the u and z bands and am = 0.03 in the remaining bands. 
The lower panel shows the redshift error induced by magnitude 
errors; the solid line has constant error across the bands, while 
the dashed line has constant error in g,r,i and zero S/N in u 
and z. Since the magnitude errors are independant of redshift, 
the redshift errors are simply computed over the entire redshift 
range. 



and suggest either errors in the template, or errors in the 
photometric zeropoints or filter curves. 

In order to parametrize the error distribution, we di- 
vide the calibration data into redshifts of width 0.05 (ex- 
cept between z = 0.6 and 0.7, which we combine because 
of the small number of galaxies in that range). Within each 
of these ranges, we fit the error distribution with a sum of 
Gaussians, 

P{Sz = Zspcctro- Zphoto) = Jj= ^ 6i exp ( ^^^^''^ j ,(11) 

where A/" is the normalization given by 

AT 

JV = '^V2^b,a, . (12) 

i = l 

We find, as shown in Figs. El and that the error distribu- 
tion is well approximated by two Gaussians. The parameters 
of the fits for both the simple template and hybrid methods 
are in TablesQand|21respectively. We note that the cores of 
these error distributions are significantly tighter than the er- 
rors mentioned above. However, the error distributions typ- 
ically have long wings that are responsible for most of the 
measured RMS errors. The discrepancies between the SDSS 
and SDSS-2dF samples in the overlap region are due to a 
colour bias introduced by the sharp colour cuts, resulting in 
a bias in the redshift estimation for Cut II galaxies between 
z = 0.35 — 0.45. We therefore recommend using the SDSS 



SDSS/SDSS-2dF Photometric redshift errors 
Single Template Fitting 
Double Gaussian fits 
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Table 1. Double Gaussian fits to the photometric redshift error, 
z~Zp as a function of z for the SDSS and SDSS-2dF data, (mi , cri) 
and (m2 , a-2 ) are the mean and standard deviation of the first and 
second Gaussians respectively, while b is the ratio of the amplitude 
of the second Gaussian to the first. The photometric redshifts 
were computed using the method of Sec 13. II We recommend using 
the SDSS distributions to ^ = 0.45 and SDSS-2dF for higher 
redshifts. 



SDSS/SDSS-2dF Photometric redshift errors 
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Double Gaussian fits 
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Table 2. Same as Table 111 except that the photometric redshifts 
were computed using the methods of Sec l3.2l 



distributions to z = 0.45 for samples by combining Cut I 
and Cut II. 

In addition to measuring the error distribution, it is use- 
ful to measure the fraction of galaxies whose redshifts are 
"catastrophically" wrong. We define a catastrophic failure 
as a photometric redshift that differs from the spectroscopic 
redshift by more than /S.Zc, where we use Azc = 0.1 and 0.2. 
For Azc = 0.1, we have a catastrophic failure rate of 3.5% 
for the simple template fitting algorithm, and 1.5% for the 
hybrid algorithm. However, a large fraction of this is dom- 
inated by the underestimation of the photometric redshifts 
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Figure 7. Same as Fig. 151 except for the SDSS-2dF galaxies selected using Cut II and from redshifts 0.35 to 0.7. 



at z > 0.5. If we increase Azc to 0.2, the failure rate drops 
to under 0.5%. 



4 ESTIMATING dN/dz 

In the previous section, we estimated the photometric red- 
shift error distributions as a function of the true redshift of 
the galaxy. With this in hand, we turn to the problem of 
estimating the actual redshift distribution, dN/ dz of a sam- 
ple of galaxies given the distribution of their photometric 
redshifts, [dN/dz\p . 

As discussed in the Introduction, the apparently trivial 
solution to this problem is to measure the error distribution 
not as a function of the true redshift, but as a function of 
photometric redshift. One can then add these distributions, 
weighted by [dN/dz\p to estimate the true redshift distribu- 
tion. The problem with this approach is that the photo- 



metric redshift error distributions measured as a function of 
photometric redshift depend sensitively on the selection cri- 
teria of the calibration sample. If these criteria don't match 
those of the full sample (and in general, they will not), then 
dA^/dz estimated using the above technique will be biased. 

In order to proceed, we observe that the photometric 
redshift distribution is simply the convolution of the true 
redshift distribution with redshift errors. 



(13) 



dm /dN_\ 
dz \ p \ dz J 

If we define A{z ~ Zp,z) as the probabihty that a galaxy at 
redshift z is scattered to photometric redshift Zp, then we 
can write out the above more concretely. 



[dN 



dz 



{zp)= / dz' 
P Jo 



dN 



dz 



{z')A{z'- 



(14) 



where the left side has the known [dN/dz]p , while the right 
is the unknown dN/ dz . Eq. ^| is a Fredholm equation of 



© 0000 RAS, MNRAS 000, 000-000 



10 Padmanabhan et al 



the first kind ^ and is ubiquitous throughout astronomy 
iCraig fc Brownll986h . Unfortunately, such problems do not 
possess a unique solution and moreover, are ill-conditioned. 
Small perturbations in the data can produce solutions that 
are arbitrarily different. This is not surprising, given that 
Ea ll4l describes a smoothing operator that generically loses 
information, implying that the solution will in general re- 
quire incorporating some "prior" knowledge about dN/dz . 



4.1 Discretization and The Classical Solution 



We begin by approximating [dN/dz\p as a stepwise constant 
function measured in n bins, [z^, Zp"*"^) with i = 0, . . . n — 1, 
and dN/dzin m bins, [z^,z^^^) where j — 0, . . . ,m — 1. 
Substituting into Ea ll4l we obtain 



dN 



dz 



Aij 



dN 



dz 



(15) 



where we assume the Einstein summation convention. The 
response matrix Aij is given by 

1 r^^' , r'^' 

Aij = —pi / dz'p I dz' A{z' - z'p,z') . (16) 

For the specific case where A can be described by a sum 
of A'^ Gaussians (Eq. Illll . one can do one of the integrals 
explicitly to obtain 



A, 



,1+1 



dz' 



N 



[f{Zk,Z^^'^,(Tk,z'p 

where we define 



f(z,z^,a,z) = erf 



/(zfc, 



V2a 



, 0-fc,2, 



sgn 



V2a 



(17) 



(18) 



where sgn is the sign operator and erf is the error func- 
tion. Note that discretizing the problem has recast an inte- 
gral equation (Ea ll4ll into a matrix problem (Eg. llSH . albeit 
with a non-square matrix. We can obtain a sol ution to this 
probl em by singular value decomposition fSVD. IPress et all 
Il992r) . We denote this the classical solution since we do not 
explicitly use any prior information about dN/ dz . 

In order to understand the behaviour of the classical 
solution, we test it on simulations of the photometric redshift 
distribution. We start by distributing galaxies randomly in 
redshift between z — and z = 1 according to. 



dN £ 

~dz ^ 1 + exp(202 - 14) 



(19) 



This distribution initially grows as z , and is exponentially 
cut off at z ~ 0.6, and approximates a volume limited dis- 
tribution with a magnitude limit at high redshifts. Random 
redshift errors, using the model of Table [T] are added to 
obtain photometric redshifts, Zp. For redshifts greater than 
0.7, the errors are sampled from a Gaussian whose mean and 
width are obtained by linearly extrapolating the errors from 
Table Q Finally, we restrict to galaxies with Zp £ [0.1,0.7]. 



^ For a non-technical introduction, see lPress et al.Hl992l) . Chap. 
18 




Figure 8. Results of simulations of the classical solution of the 
redshift inversion problem. The solid [black] histogram is the true 
redshift distribution, while the broken [red] histogram shows the 
photometric redshift distribution. The connected boxes [green] 
and stars [blue] show the reconstructed redshift distributions for 
different discretizations (10 and 15 bins, respectively) of the pho- 
tometric redshift distributions. In both cases, the reconstructed 
distribution is parametrized by 20 step functions. 



An example of the true and photometric redshift distribu- 
tions is shown in Fig.|H| 

The photometric redshift distribution is then dis- 
cretized into n bins, [dN/dz]p^i. We present results for 
n = 10 {Az = 0.06) and n = 15 {Az = 0.04). The estimated 
dN/ dz is likewise parametrized as a piecewise constant func- 
tion from 2 = to 2 = 1 with a step width of Az — 0.05. 
Using these parametrizations, we construct the response ma- 
trix Aij CEa \Wl and solve for dN/dz using Eq. ^| For the 
parameters considered here (and indeed, for generic choices) , 
this is an underdetermined linear system. W e solve it using 
SVD and backsubstitution jPress et alll992h . setting singu- 
lar values < 10~^ to zero. Fig.|S]shows the estimated dN/dz 
averaged over 50 simulations, and compares it to the true 
redshift distribution. 

The first observation is that the classical solution re- 
constructs the redshift distribution accurately for certain 
choices of discretizations, and in particular, for discretiza- 
tions of the photometric distributions with step sizes ap- 
proximately the width of the photometric redshift errors. 
The largest errors are for z > 0.9 that result from the fact 
that dN/dz is almost completely unconstrained at these red- 
shifts by [dN/dz]p as only 6 per cent of objects at these 
redshifts scatter to 2 < 0.7. 

We also observe that as we increase the resolution of 
[dN/dz]p, the reconstruction goes unstable, ringing at the 
edges of the photometric redshift catalogue. Note that the 
reconstructions in Fig. |H|are averages, and the instabilities 
in a single reconstruction are significantly larger. 

This behaviour has a simple, intuitive explanation. The 
effect of photometric redshift errors is to smooth away the 
high frequency ((redshift error)"'^) components in dN/dz. 
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However, [dN/dz]p has high frequencies due to noise in the 
data, and these induce large oscillations in the reconstruc- 
tion. To be more quantitative, we start with a simplified 
model of the photometric errors. 



A{z- 



Zp, z) oc exp 



2cr2 



(20) 



A component of dN/dz with frequency k will be attenuated 
by a factor of 



/ 



dz exp 



{z Zp) \ 

e 



2cr2 



OC exp 



(21) 



However, [dN/dz]p has a Poisson noise component that 
tends to a constant at high frequencies. Therefore, the in- 
version excites high frequency modes in the reconstruction 
with amplitude oc exp(fc^a^/2). Ea. l21l also implies that this 
becomes significant for modes with k > l/a, agreeing with 
our intuitive picture. 

The effect of the discretization step size on the the sta- 
bility of the classical solution is now clear; discretization cuts 
off frequencies higher than ~ 1/5^ where Sz is the step size, 
filtering out the problematic modes. This also suggests that 
the ideal discretizations have 5z a, as demonstrated in 
our simulations. 



4.2 Regularization 

We would like to modify the classical solution so that it 
becomes less sensitive to the inversion instability discussed 
in the previous section. In order to do so, it is useful to 
rephrase the classical solution as a minimization problem''. 
If we define the energy functional. 



Eo 





rdivi 




rdiVi 














. dz . 


j 


. dz . 


P,i 



(22) 



then the classical solution is the value of dN/dz that min- 
imizes Eq. Given this description, it is trivial to include 
a penalty function that imposes smoothness on the recon- 
structed function, 



E = Eo + \P 



(23) 



where P is the penalty function and A adjusts the relative 
weight of P in the minimization of E'' . There are number 
of possible choices for the P that would impose smoothness; 
we use the forward difference operator, 



E 

3=0 



[ dN 
Vdl 



[dN 



dz 



(24) 



There remains the problem of choosing an appropriate 
value for A. Unfortunately, there is no a general method 
for choosing an optimal value. The best that we can do 
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Figure 9. The value of as a function of A for the simulations 
discussed in the text. A has been rescaled such that A = 1 corre- 
sponds to equal weight being given to Eg and P in Eq. 1231 The 
dotted and dashed lines show the error and stability components 
of Ea. l25l resDectivelv. As expected, the error term increases with 
increasing A, while the stability term decreases with increasing A. 
The minimum of occurs near A = 0.5. 



is to define a general merit function that objectively se- 
le cts an appropriate ran ge for A. Based on the discussion 
in ICraig fc Brownl il986l) , we use 



-2 = i 

n 



E 





rdiVi 




rdNl 
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rdiVi 




( 


. dz . 
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. dz . 





(25) 



where the average reconstruction [dN/dz]av.j is estimated 
either from simulations or bootstrap resampling. The first 
term is a measure of how well the reconstructed dN/dz re- 
produces the observed [dN/dz]p; this term is minimized as 
A ^ * and increases with increasing A. The second term, 
the error in the reconstruction, measures its stability to the 
presence of noise in the data. As A — > oo, the penalty func- 
tion dominates the minimization and the reconstruction is 
the most stable. As A decreases, the reconstruction is more 
sensitive to noise in the data, increasing this term. Choosing 
a value of A near® the minimum of picks a compromise 
between an accurate and stable reconstruction. 

In order to test this method, we return to the sim- 
ulations of the previous section. Since the regularization 
removes the sensitivity to the discretization of the photo- 
metric distribution, we discretize [dN/dz]p into 50 bins of 



^ For an alternative approach to solving this problem, see iLucvl 
Jl974) 

^ This approach appears in the literature as the method of 
regularization, the Phillips - Twomey method, the constrained 
linear inversion m ethod and Tikhonov - Miller regularization 
JPress et al.lll992^ . 



* Assuming the generic case of an underdetermined system, m 2> 
n. 

^ We are being intentionally vague here; the precise minimum 
may not be the optimal choice. However, the value of provides 
a measure of the error that one is making as we move from the 
minimum. 
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thickness Az = 0.012. The estimated redshift distribution 
is parametrized by 40 step functions of width Az = 0.025. 
Given these parameters, we must estimate the appropriate 
value of A. To do this, we run 50 simulations for a given 
value of A to evaluate and repeat this for a grid of values 
of A. The results are shown in Fig. |3] We note that 'S? has 
a well defined minimum, with the error and stability terms 
demonstrating the A dependence that we anticipated. Note 
that the error term does not go to zero as A — > 0, but ap- 
pears to asymptote to a non-zero constant. This is readily 
understood in terms of the discussion in the previous sec- 
tion : the measured [dN/dz\p has a high-frequency noise 
component that cannot be reproduced by the convolution 
of dN/dz with the redshift errors. It is this noise component 
that is responsible for the non-zero value of the error term 
in as A ^ 0. 

The upper panel of Fig. 1101 shows the average of 50 
reconstructions of dN/dz for values of A near the minimum 
of H'^. We observe that for all the values of A considered 
in the figure, the reconstructions closely match the input 
redshift distribution for all redshifts < 0.7. As before, the 
largest discrepancies are at high redshift because of the lack 
of constraint due to the upper photometric redshift limit 
of 0.7. It is also instructive to consider extreme values of 
A; these are shown in the lower panel of Fig llOl For small 
values of A, the reconstructions are extremely noisy, while 
for large values of A, the penalty function dominates the 
reconstruction. Note that the forward difference operator 
(Eq. I24II represents a constant prior, which is what we see 
the reconstructions approaching as \ oo. 

We make one cautionary observation. Based on Fig. 1101 
one might conclude that the best strategy for choosing A is 
to preferentially choose a smaller value than what is sug- 
gested by the minimum of 'S? . We however discourage this 
because, as indicated in Fig|^ such reconstructions are very 
noisy. This lack of stability would result in small errors in 
the redshift error distribution being amplified in the recon- 
structions. 

How many galaxies are required for the inversion? The 
simulations discussed above used 100,000 galaxies, similar 
to the expected number of photometric LRGs over the same 
area of sky. We have however tested the inversion on as 
few as 1000 galaxies, and found that, for appropriate reg- 
ularizations, the algorithm reconstructs the input redshift 
distribution. However, for small samples, the Poisson noise 
in the input photometric redshift distribution can be sig- 
nificant, resulting in a noisier reconstruction (for the same 
redshift resolution). This may be improved by smoothing 
the resulting reconstruction or equivalently, reconstructing 
the redshift distribution on a coarser redshift grid. 

There is an important generalization of this method 
that should be mentioned. We introduced the concept of reg- 
ularization and the penalty function to cure an instability in 
the deconvolution as we attempted a finer resolution of the 
redshift distribution. Phrased differently, the deconvolution 
became unstable when when the input became low S/N and 
the prior (in the form of the penalty function) compensated 
for this loss of information. In the cases considered in this 
paper, we have used a relatively weak prior; however, if one 
has reliable prior information (for eg. the rough shape of the 
redshift distribution), one can easily include that informa- 
tion. A strong prior will allow one to obtain a solution even 
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Figure 10. Regularized estimates of dN/dz for different values 
of tlie regularization parameter, A. In both panels, the black his- 
togram shows the true redshift distribution. The upper panel 
shows the reconstructions for values of A about the minimum 
of S^; the stars (red), diamonds (green), and crosses (blue) corre- 
spond to A = 0.1, 0.5, and 1.0 respectively. The lower panel shows 
the reconstructions for extreme values of A, the crosses (red) and 
diamonds (green) correspond to A = 10"^'' and 1000 respectively. 
The values of A have been rescaled as in Fig. |5] 

in the low S /N regime. We do however remind the overzeal- 
ous reader that the usual caveat about strong priors does 
apply in the case; the method cannot distinguish an incor- 
rect prior, and will get the wrong answer if such a prior is 
heavily weighted. 



4.3 Application to SDSS Data 

Before applying this algorithm to a photometric sample, we 
test it against the Cut I calibration dataset described in 
SecIO The results are in Fig llll The reconstructed redshift 
distribution correctly captures all the broad features of the 
true redshift distribution, including correcting for the bias 
at z ~ 0.1 and sharpening the dip at z ~ 0.25. Fig llll also 
highlights the inability of this method to reconstruct sharp 
features since these are disfavoured by the smoothness prior 
we impose; the inversion works best for broad features. It is 
worth emphasizing that most sharp features (including the 
feature at 2: ~ 0.075) are spurious (eg. binning artifacts). 
However, if a sharp feature is physically expected in the 
distribution, the prior must be adjusted to allow for this. 

We conclude this discussion by applying the above algo- 
rithm to the SDSS photometric data. A detailed discussion 
of the construction and properties of the SDSS photometric 
LRG sample will be presented elsewhere; briefly, the sample 
is constructed by applying the photometric selection crite- 
ria (Cut I and Cut II, see Sec. 12.31 to objects classified as 
galaxies by the photometric pipeline. We then estimate a 
photometric redshift for each of the selected objects using 
the simple template fitting code of Sec. 13.11 however, the 
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Figure 11. Regularized estimate [solid, black] of dN/dz for the 
Cut I calibration data [histogram]. The input photometric red- 
shift distribution is the dashed [red] line. dN/dz is normalized to 
integrate to unity. 
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Figure 12. Regularized estimate [solid, red] of dN/dz for the 
LRG sample culled from the SDSS photometric data, compared 
with the photometric redshift distribution [histogram, black] . The 
redshift distribution is for galaxies with 0.2 < Zphoto < 0.6, indi- 
cated by the vertical dashed lines. As before, dN/dz is normalized 
to integrate to unity. 



results are insensitive to the choice of algorithm. The pho- 
tometric redshift distribution is shown in Fig. 1121 

One feature of this distribution that deserves some ex- 
planation is the "bump" in the number of galaxies at z ~ 0.7. 
This is inconsistent with being the same population of LRGs 
selected with an apparent magnitude cut. It is unlikely 
that these are a different population at z ^ 0.7, as they 



would have to be a significantly brighter population than 
the LRGs, that only appeared at high redshifts. A more 
likely explanation is that these are faint galaxies at lower 
redshifts scattered to high redshifts by photometric errors. 
This is more hkely, in light of the fact that these galaxies 
have i ~ 20, 3 — r ~ 2 and r — i ^ 1, giving them r ~ 21 and 
g ~ 23. This is at the very edge (or beyond) the photometric 
completeness of SDSS, and the measurements will have sig- 
nificant photometric errors (~ tenths of a magnitude) . Given 
such photometric errors, it is likely for the more numerous 
low redshift galaxies to be scattered into the LRG colour 
space. Furthermore, the spectral templates that we use are 
not well constrained by observations for redshifts > 0.7. To 
avoid the complications of correcting for such contamina- 
tion, we restrict our catalog to Zphoto < 0.6. Similarly, as 
discussed earlier, the photometric selection breaks down at 
low redshifts, and so, we impose a lower redshift cutoff of 
Zphoto > 0.2. Note that this lower redshift cut is imposed 
only to select a uniform sample; the inversion must be (and 
is) performed at all redshifts. However, the small photomet- 
ric redshift error at these redshifts minimizes the contamina- 
tion from these galaxies, effectively truncating the inverted 
distribution at z ~ 0.2. 

We can now apply the inversion algorithm to estimate 
the true redshift distribution, using the error distributions 
measured in Sec. 13. 31 The merit function, Eg 1251 is computed 
by bootstrap resampling the actual catalog; the measured 
has a form similar to Fig.|^ Using the value of the regulariza- 
tion parameter. A, obtained from H^, we show the estimated 
redshift distribution for galaxies with 0.2 < Zphoto < 0.6 in 
Fig. 1121 The underestimation of the photometric redshifts at 
high redshifts is immediately apparent from the comparison 
of the two distributions. The bumps from z — 0.3 to 0.4 are 
a residual artifact of the inversion. These vanish for higher 
values of A, and become stronger for lower regularizations, 
but are more unstable. The value of A used is a balance 
between this stability and accuracy, as intended. 



5 DISCUSSION 

As we discussed in the Introduction, constructing a photo- 
metric redshift catalogue involves three steps - photometri- 
cally selecting a sample for which accurate photometric red- 
shifts can be obtained, measuring the photometric redshift 
error distribution for the resulting sample, and estimating 
the true redshift distribution. This paper describes all stages 
of this process. 

• We describe the selection of a sample of Luminous 
Red Galaxies (LRGs) using the SDSS photometric sam- 
ple. These galaxies are typically old elliptical systems with 
strong 4000 Abreaks in their continua. The shifting of this 
feature through the SDSS filters make accurate photometric 
redshifts possible. 

• We measure the error distribution of this sample by 
comparing photometric and spectroscopic redshifts for a cal- 
ibration subsample of galaxies culled from the SDSS and 
SDSS-2dF spectroscopic catalogues. The scatter in the red- 
shifts is approximately a ~ 0.03 at redshifts less than 0.55, 
and increases at higher redshifts due to increased photomet- 
ric errors and uncertainties in the templates. 
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• The accuracy of the photometric redshifts is similar for 
the two algorithms we consider, a simple template fit and a 
hybrid algorithm that adjusts the template to better fit the 
observed colour distribution. 

We have specifically used the SDSS photometric sample 
throughout this paper, both as a example and for its intrin- 
sic interest. However, we emphasize that the entire process 
that we describe can be reconstructed for any multi-colour 
imaging survey with appropriate filters. 

Using such a photometric redshift sample requires 
knowing the conditional probability that a galaxy with a 
photometric redshift, Zphoto has a true redshift, Zspectro- 
Given the redshift error distribution, this conditional prob- 
ability can be readily estimated using Bayes' theorem if the 
true underlying redshift distribution is known. Using the 
fact that the photometric redshift distribution is the true 
redshift distribution convolved with redshift errors, we have 
presented a method to deconvolve the errors to estimate 
the redshift distribution. This method is ill-conditioned, and 
therefore, we use a prior on the smoothness of the redshift 
distribution to regularize the deconvolution. We have cali- 
brated the relative weight of this prior by measuring the ac- 
curacy and stability of the recovered redshift distributions, 
and we proposed a general merit function that objectively 
determines this weight. 

We conclude with a few comments about this algorithm. 

• The particulars of the sample selection are encoded into 
the photometric error distribution. The method is therefore 
completely general, and applicable to any combination of 
colour selections and photometric redshift cuts. 

• The accuracy of the recovered redshift distribution is 
determined by the accuracy of the input error distributions. 
Therefore, it is essential that the calibration data used to 
measure the error distribution correspond as closely as pos- 
sible to the actual data, both in photometric depth and ac- 
curacy. One can attempt to extrapolate these distributions 
to fainter magnitudes or measure them from simulations, 
but with the caveat that the actual distributions may be 
very different from these, and that the reconstruction could 
potentially be sensitive to these errors. We emphasize that 
this limitation is not unique to this methods, but affects all 
analyses that use photometric redshifts. 

• The deconvolution algorithm is formally applicable to 
arbitrary error distributions. However, for complex error dis- 
tributions (eg. multiply peaked distributions), multiple so- 
lutions may exist and there is no guarantee that the method 
will converge to the correct solution. This problem is avoided 
here by the use of photometric pre-selection; in general, it 
could also be prevented by the use of priors in the photomet- 
ric redshift estimation. We strongly recommend using one of 
these methods to break photometric redshift degeneracies. 

• An advantage to this method is that the calibration 
data need not sample the redshift range of interest in the 
same manner as the photometric data. It suffices that it 
samples the entire range well enough to measure the error 
distributions. This allows the use of calibration sets from 
heterogeneous samples, as was done in this paper. 

• The inversion algorithm presented in this paper 
presents an alternative to iterative deconvolution algorithms 
jLucvlll97l: iBrodwin et al.ll2003l) . As emphasized bv ILucvI 
l|l974) . the two methods have very difi'erent mathematical 



philosophies; iterative methods treat the problem as one in 
statistical estimation, while the philosophy in this paper de- 
rives from the theory of integral equations. However, in the 
high S/N regime, both methods will produce similar results, 
and there is little to distinguish the two. For low S/N, the 
deconvolution problem may not possess a solution, and itera- 
tive methods may not converge. In these cases, the algorithm 
presented in this paper transparently allows the inclusion of 
external information as part of the penalty function to yield 
a meaningful solution. In cases where one possesses reliable 
prior information, one can then refine that information to 
yield a better solution. 
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